Direct Text to Speech Translation System Using Acoustic Units

نویسندگان

چکیده

This paper proposes a direct text to speech translation system using discrete acoustic units. framework employs in different source languages as input generate the target language without need for transcriptions this language. Motivated by success of units previous works systems, we use same pipeline extract encoder combined with clustering algorithm. Once are obtained, an encoder-decoder architecture is trained predict them. Then vocoder generates from Our approach was tested on new CVSS corpus two mBART models employed initialisation. The systems presented report competitive performance most pairs evaluated. Besides, results show remarkable improvement when initialising our proposed model pre-trained more languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Variable-length acoustic units inference for text-to-speech synthesis

The best voices in text-to-speech synthesis are currently obtained via acoustic units concatenation-based systems. In such systems, the choice of units whose concatenations will produce an acoustic message is a crucial stage. Moreover, it can be observed that current TTS systems use acoustic units which most often correspond to variable-length phonetic descriptions. In this article, an original...

متن کامل

Text to Speech Synthesis of Hindi Language using Polysyllable Units

A Text To Speech (TTS) synthesis is a computer based system that should be able to read any text aloud. Thus TTS technology is essential to those people who are visually impaired. It also plays a very important role in the field of Telecommunication, Industrial and educational applications. Thus TTS has been developed for foreign languages and is well established. As Indian language characters ...

متن کامل

Cantonese text-to-speech synthesis using sub-syllable units

This paper describes our recent investigation on the use of both intra-syllable and cross-syllable acoustic units for Cantonese text-to-speech synthesis. In our previous work, isolated monosyllable units were used for concatenative speech synthesis of Cantonese. The synthetic speech was considered to be unnatural in such a way that there was an obvious lack of perceptual continuity. The propose...

متن کامل

Adapting machine translation models toward misrecognized speech with text-to-speech pronunciation rules and acoustic confusability

In the spoken language translation pipeline, machine translation systems that are trained solely on written bitexts are often unable to recover from speech recognition errors due to the mismatch in training data. We propose a novel technique to simulate the errors generated by an ASR system, using the ASR system’s pronunciation dictionary and language model. Lexical entries in the pronunciation...

متن کامل

An Introduction to Speech Sciences (Acoustic Analysis of Speech)

Speech sciences deal with the acoustical characteristics of speech by means of sophisticated soft wares as well as hard wares. Although, a speech science is a well known science in the developed countries, especially the western societies, however, it has been remained almost unknown in Iran, though, in recent years a group of scholars have been involved in this branch of science. The applicati...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Signal Processing Letters

سال: 2023

ISSN: ['1558-2361', '1070-9908']

DOI: https://doi.org/10.1109/lsp.2023.3313513